Skip to content

ML-Density Estimation

Nonparametric Density Estimation

For a random vector x, assuming that it obeys an unknown distribution p(x), the probability of falling into a small area R in the space is

P=Rp(x)dx

Given N training samples, Number of samples K falling into the region R follows a binomial distribution

PK=(NK)PK(1P)NK

Approximation when N is very large:

When n is very large, we can approximately think that

PKN

Assuming R is small and p(x) is approximately constant within R:

Pp(x)V

Final approximation for p(x):

p(x)KNV

To accurately estimate p(x), it is necessary to make N large enough and V as small as possible. However, the number of samples is generally limited, and too small a region will lead to fewer samples falling into the region, so the estimated probability density is not accurate.

Fixed area size, counting the number falling into different areas, which includes histogram method and kernel method.

the area size so that the number of samples falling into each area is zero is called K-nearest neighbor method.

Histograms as density models

For low dimensional data we can use a histogram as a density model.

Histograms

  • How wide should the bins be? (width=regulariser)
  • Do we want the same bin-width everywhere?
  • Do we believe the density is zero for empty bins?

Kernel Density Estimation (KDE)

1. Definition

Kernel Density Estimation (KDE) is a non-parametric method to estimate the probability density function (PDF) of a random variable.


2. KDE Formula

p(x)=1Nn=1N12πHexp((xxn)22H2)
  • p(x): Estimated density at point x.
  • N: Total number of data points.
  • H: Bandwidth, controlling the smoothness of the density.
  • xn: Data points.
  • The kernel function ϕ is typically Gaussian:ϕ(xxnH)=12πHexp((xxn)22H2)

3. Steps to Compute KDE

  1. For each data point xn, calculate the distance from the target point x.
  2. Apply the kernel function to determine the weight of each data point.
  3. Sum the contributions from all data points and normalize by N.

4. Example

Data

We have 5 data points:

x={1.0,1.5,2.0,3.0,3.5}

We want to estimate the density at z=2.5, using:

  • Bandwidth H=0.5,
  • Gaussian kernel.

Calculation

For each xn, calculate:

ϕ(zxnH)=12πHexp((zxn)22H2)
  • For x1=1.0:

    ϕ(2.51.00.5)=12π0.5exp((2.51.0)220.52)0.008
  • For x2=1.5:

    ϕ(2.51.50.5)=12π0.5exp((2.51.5)220.52)0.107
  • For x3=2.0:

    ϕ(2.52.00.5)=12π0.5exp((2.52.0)220.52)0.483
  • For x4=3.0:

    ϕ(2.53.00.5)=12π0.5exp((2.53.0)220.52)0.483
  • For x5=3.5:

    ϕ(2.53.50.5)=12π0.5exp((2.53.5)220.52)0.107
Combine Contributions

The total density at z=2.5 is:

p(z)=1Nn=1Nϕ(zxnH)

Substitute values:

p(z)=15(0.008+0.107+0.483+0.483+0.107)p(z)=151.1880.238

5. Advantages of KDE

  • Flexible: Does not assume a specific distribution of data.
  • Smooth: Produces a continuous estimate.

6. Challenges of KDE

  • Bandwidth H: Choosing an appropriate H is critical.
    • Small H: May overfit, capturing noise.
    • Large H: May oversmooth, losing details.
  • Computationally Expensive: Requires evaluating kernel functions for all data points.

Summary

In this example, the estimated density at z=2.5 is p(z)0.238. Kernel Density Estimation is a powerful tool for non-parametric density estimation, but requires careful parameter tuning.